900 research outputs found
Learning and Refining of Privileged Information-based RNNs for Action Recognition from Depth Sequences
Existing RNN-based approaches for action recognition from depth sequences
require either skeleton joints or hand-crafted depth features as inputs. An
end-to-end manner, mapping from raw depth maps to action classes, is
non-trivial to design due to the fact that: 1) single channel map lacks texture
thus weakens the discriminative power; 2) relatively small set of depth
training data. To address these challenges, we propose to learn an RNN driven
by privileged information (PI) in three-steps: An encoder is pre-trained to
learn a joint embedding of depth appearance and PI (i.e. skeleton joints). The
learned embedding layers are then tuned in the learning step, aiming to
optimize the network by exploiting PI in a form of multi-task loss. However,
exploiting PI as a secondary task provides little help to improve the
performance of a primary task (i.e. classification) due to the gap between
them. Finally, a bridging matrix is defined to connect two tasks by discovering
latent PI in the refining step. Our PI-based classification loss maintains a
consistency between latent PI and predicted distribution. The latent PI and
network are iteratively estimated and updated in an expectation-maximization
procedure. The proposed learning process provides greater discriminative power
to model subtle depth difference, while helping avoid overfitting the scarcer
training data. Our experiments show significant performance gains over
state-of-the-art methods on three public benchmark datasets and our newly
collected Blanket dataset.Comment: conference cvpr 201
Occlusion-aware Hand Pose Estimation Using Hierarchical Mixture Density Network
Learning and predicting the pose parameters of a 3D hand model given an
image, such as locations of hand joints, is challenging due to large viewpoint
changes and articulations, and severe self-occlusions exhibited particularly in
egocentric views. Both feature learning and prediction modeling have been
investigated to tackle the problem. Though effective, most existing
discriminative methods yield a single deterministic estimation of target poses.
Due to their single-value mapping intrinsic, they fail to adequately handle
self-occlusion problems, where occluded joints present multiple modes. In this
paper, we tackle the self-occlusion issue and provide a complete description of
observed poses given an input depth image by a novel method called hierarchical
mixture density networks (HMDN). The proposed method leverages the
state-of-the-art hand pose estimators based on Convolutional Neural Networks to
facilitate feature learning, while it models the multiple modes in a two-level
hierarchy to reconcile single-valued and multi-valued mapping in its output.
The whole framework with a mixture of two differentiable density functions is
naturally end-to-end trainable. In the experiments, HMDN produces interpretable
and diverse candidate samples, and significantly outperforms the
state-of-the-art methods on two benchmarks with occlusions, and performs
comparably on another benchmark free of occlusions
Transition Forests: Learning Discriminative Temporal Transitions for Action Recognition and Detection
A human action can be seen as transitions between one's body poses over time,
where the transition depicts a temporal relation between two poses. Recognizing
actions thus involves learning a classifier sensitive to these pose transitions
as well as to static poses. In this paper, we introduce a novel method called
transitions forests, an ensemble of decision trees that both learn to
discriminate static poses and transitions between pairs of two independent
frames. During training, node splitting is driven by alternating two criteria:
the standard classification objective that maximizes the discrimination power
in individual frames, and the proposed one in pairwise frame transitions.
Growing the trees tends to group frames that have similar associated
transitions and share same action label incorporating temporal information that
was not available otherwise. Unlike conventional decision trees where the best
split in a node is determined independently of other nodes, the transition
forests try to find the best split of nodes jointly (within a layer) for
incorporating distant node transitions. When inferring the class label of a new
frame, it is passed down the trees and the prediction is made based on previous
frame predictions and the current one in an efficient and online manner. We
apply our method on varied skeleton action recognition and online detection
datasets showing its suitability over several baselines and state-of-the-art
approaches.Comment: to appear in CVPR 201
Deep Convolutional Decision Jungle for Image Classification
We propose a novel method called deep convolutional decision jungle (CDJ) and
its learning algorithm for image classification. The CDJ maintains the
structure of standard convolutional neural networks (CNNs), i.e. multiple
layers of multiple response maps fully connected. Each response map-or node-in
both the convolutional and fully-connected layers selectively respond to class
labels s.t. each data sample travels via a specific soft route of those
activated nodes. The proposed method CDJ automatically learns features, whereas
decision forests and jungles require pre-defined feature sets. Compared to
CNNs, the method embeds the benefits of using data-dependent discriminative
functions, which better handles multi-modal/heterogeneous data; further,the
method offers more diverse sparse network responses, which in turn can be used
for cost-effective learning/classification. The network is learnt by combining
conventional softmax and proposed entropy losses in each layer. The entropy
loss,as used in decision tree growing, measures the purity of data activation
according to the class label distribution. The back-propagation rule for the
proposed loss function is derived from stochastic gradient descent (SGD)
optimization of CNNs. We show that our proposed method outperforms
state-of-the-art methods on three public image classification benchmarks and
one face verification dataset. We also demonstrate the use of auxiliary data
labels, when available, which helps our method to learn more discriminative
routing and representations and leads to improved classification
The factor bias of technical change and technology adoption under uncertainty
This dissertation examines the impact of uncertainty on the factor bias of technical change and technological adoption behavior. An Ito stochastic control model, which is characterized by endogenous factor-augmenting technical change, is developed to investigate the relationship between uncertainty and the bias of technical change;The results show that, if a risk-averse firm faces input price uncertainty, technical change will be biased toward the input that has the more certain price. Output price uncertainty does not affect the direction of technical change bias but does affect the degree of bias. Under output price uncertainty and an input price uncertainty, technical change may be biased toward the input that has a certain price if the contemporaneous correlation coefficient between the two processes is negative or insignificantly positive. On the contrary, if the coefficient is significantly positive, technical change may be biased toward the input that has an uncertain price;It is also shown that, under production uncertainty, technical progress will be biased toward risk-reducing inputs and against risk-increasing inputs. The degree of technical change bias would be increased as the riskiness increases or as the firm becomes more risk averse;The model is integrated to incorporate hedging or forward contracts. Under output price uncertainty, the existence of forward markets has no effect on the direction of technical change bias but has an effect on the degree of bias. Under output price uncertainty and an input price uncertainty, if the forward market is unbiased, technical change will be biased toward the input that has a certain price;This dissertation also examines the effect of price uncertainty on technology adoption patterns and technological change. The results indicate that a reduction in the variance of output price will increase the rate of technology adoption and the intrafirm diffusion speed of yield-increasing technologies. The opposite is true for cost-reducing technologies
Learning Deep Convolutional Embeddings for Face Representation Using Joint Sample- and Set-based Supervision
In this work, we investigate several methods and strategies to learn deep
embeddings for face recognition, using joint sample- and set-based
optimization. We explain our framework that expands traditional learning with
set-based supervision together with the strategies used to maintain set
characteristics. We, then, briefly review the related set-based loss functions,
and subsequently propose a novel Max-Margin Loss which maximizes maximum
possible inter-class margin with assistance of Support Vector Machines (SVMs).
It implicitly pushes all the samples towards correct side of the margin with a
vector perpendicular to the hyperplane and a strength exponentially growing
towards to negative side of the hyperplane. We show that the introduced loss
outperform the previous sample-based and set-based ones in terms verification
of faces on two commonly used benchmarks.Comment: 8 pages, 5 figures, 2 tables, workshop pape
Spatial Attention Deep Net with Partial PSO for Hierarchical Hybrid Hand Pose Estimation
Discriminative methods often generate hand poses kinematically implausible,
then generative methods are used to correct (or verify) these results in a
hybrid method. Estimating 3D hand pose in a hierarchy, where the
high-dimensional output space is decomposed into smaller ones, has been shown
effective. Existing hierarchical methods mainly focus on the decomposition of
the output space while the input space remains almost the same along the
hierarchy. In this paper, a hybrid hand pose estimation method is proposed by
applying the kinematic hierarchy strategy to the input space (as well as the
output space) of the discriminative method by a spatial attention mechanism and
to the optimization of the generative method by hierarchical Particle Swarm
Optimization (PSO). The spatial attention mechanism integrates cascaded and
hierarchical regression into a CNN framework by transforming both the input(and
feature space) and the output space, which greatly reduces the viewpoint and
articulation variations. Between the levels in the hierarchy, the hierarchical
PSO forces the kinematic constraints to the results of the CNNs. The
experimental results show that our method significantly outperforms four
state-of-the-art methods and three baselines on three public benchmarks.Comment: The work is accepted by ECCV2016, Demo video:
https://youtu.be/2Hg0c88rHkk, Project Page:
https://sites.google.com/site/qiyeincv/home/eccv201
Iterative Hough Forest with Histogram of Control Points for 6 DoF Object Registration from Depth Images
State-of-the-art techniques proposed for 6D object pose recovery depend on
occlusion-free point clouds to accurately register objects in 3D space. To
reduce this dependency, we introduce a novel architecture called Iterative
Hough Forest with Histogram of Control Points that is capable of estimating
occluded and cluttered objects' 6D pose given a candidate 2D bounding box. Our
Iterative Hough Forest is learnt using patches extracted only from the positive
samples. These patches are represented with Histogram of Control Points (HoCP),
a "scale-variant" implicit volumetric description, which we derive from
recently introduced Implicit B-Splines (IBS). The rich discriminative
information provided by this scale-variance is leveraged during inference,
where the initial pose estimation of the object is iteratively refined based on
more discriminative control points by using our Iterative Hough Forest. We
conduct experiments on several test objects of a publicly available dataset to
test our architecture and to compare with the state-of-the-art.Comment: IROS 201
Task-Oriented Hand Motion Retargeting for Dexterous Manipulation Imitation
Human hand actions are quite complex, especially when they involve object
manipulation, mainly due to the high dimensionality of the hand and the vast
action space that entails. Imitating those actions with dexterous hand models
involves different important and challenging steps: acquiring human hand
information, retargeting it to a hand model, and learning a policy from
acquired data. In this work, we capture the hand information by using a
state-of-the-art hand pose estimator. We tackle the retargeting problem from
the hand pose to a 29 DoF hand model by combining inverse kinematics and PSO
with a task objective optimisation. This objective encourages the virtual hand
to accomplish the manipulation task, relieving the effect of the estimator's
noise and the domain gap. Our approach leads to a better success rate in the
grasping task compared to our inverse kinematics baseline, allowing us to
record successful human demonstrations. Furthermore, we used these
demonstrations to learn a policy network using generative adversarial imitation
learning (GAIL) that is able to autonomously grasp an object in the virtual
space.Comment: ECCV 2018 workshop pape
Siamese Regression Networks with Efficient mid-level Feature Extraction for 3D Object Pose Estimation
In this paper we tackle the problem of estimating the 3D pose of object
instances, using convolutional neural networks. State of the art methods
usually solve the challenging problem of regression in angle space indirectly,
focusing on learning discriminative features that are later fed into a separate
architecture for 3D pose estimation. In contrast, we propose an end-to-end
learning framework for directly regressing object poses by exploiting Siamese
Networks. For a given image pair, we enforce a similarity measure between the
representation of the sample images in the feature and pose space respectively,
that is shown to boost regression performance. Furthermore, we argue that our
pose-guided feature learning using our Siamese Regression Network generates
more discriminative features that outperform the state of the art. Last, our
feature learning formulation provides the ability of learning features that can
perform under severe occlusions, demonstrating high performance on our novel
hand-object dataset.Comment: 9 pages, paper submitted to NIPS 2016, project page:
http://www.iis.ee.ic.ac.uk/rkouskou/research/SRN.htm
- …